AITopics | modern greek

Collaborating Authors

modern greek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GRDD+: An Extended Greek Dialectal Dataset with Cross-Architecture Fine-tuning Evaluation

Chatzikyriakidis, Stergios, Papadakis, Dimitris, Papaioannou, Sevasti-Ioanna, Psaltaki, Erofili

arXiv.org Artificial IntelligenceNov-11-2025

We present an extended Greek Dialectal Dataset (GRDD+) 1that complements the existing GRDD dataset with more data from Cretan, Cypriot, Pontic and Northern Greek, while we add six new varieties: Greco-Corsican, Griko (Southern Italian Greek), Maniot, Heptanesian, Tsakonian, and Katharevusa Greek. The result is a dataset with total size 6,374,939 words and 10 varieties. This is the first dataset with such variation and size to date. We conduct a number of fine-tuning experiments to see the effect of good quality dialectal data on a number of LLMs. We fine-tune three model architectures (Llama-3-8B, Llama-3.1-8B, Krikri-8B) and compare the results to frontier models (Claude-3.7-Sonnet, Gemini-2.5, ChatGPT-5).

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.03772

Country:

Europe > Greece (0.28)
North America > United States (0.26)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can AI mimic the human ability to define neologisms?

Georgiou, Georgios P.

arXiv.org Artificial IntelligenceFeb-18-2025

An ongoing and intriguing debate focuses on whether Large Language Models (LLMs) can replicate human language. The literature presents mixed evidence on this matter. Several studies suggest that LLMs can generate text closely resembling human language (Bubeck et al., 2023; Clark et al., 2021; Georgiou, 2025). However, the widely accept ed concept of a universal grammar inherent in humans (Chomsky, 2000) challenges the idea that machine cognition can mirror human cognition. According to Chomsky et al. (2023), models like ChatGPT function as statistical engines driven by pattern recognitio n. Supporting this perspective, other studies highlight significant differences between human cognition and LLMs, which are reflected in language (Cai et al., 2024; Georgiou, 2024; Herbold et al., 2023). For instance, Georgiou (2024) examined how various linguistic components are represented in human - written and AI - generated texts, assessing the ability of ChatGPT to emulate human writing. The author found that d espite AI - generated texts appear ing to mimic human language, the results revealed signifi cant differences across multiple linguistic features in the domains of phonology, grammar, and semantics.

agreement, compound, neologism, (14 more...)

arXiv.org Artificial Intelligence

2502.149

Country:

North America > Canada > Alberta (0.14)
Europe > Austria > Vienna (0.14)
Europe > Greece > Central Macedonia > Thessaloniki (0.05)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP

Bakagianni, Juli, Pouli, Kanella, Gavriilidou, Maria, Pavlopoulos, John

arXiv.org Artificial IntelligenceJul-13-2024

Natural Language Processing (NLP) research has traditionally been predominantly focused on English, driven by the availability of resources, the size of the research community, and market demands. Recently, there has been a noticeable shift towards multilingualism in NLP, recognizing the need for inclusivity and effectiveness across diverse languages and cultures. Monolingual surveys have the potential to complement the broader trend towards multilingualism in NLP by providing foundational insights and resources necessary for effectively addressing the linguistic diversity of global communication. However, monolingual NLP surveys are extremely rare in literature. This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys. Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks. We include a classification of Language Resources (LRs), according to their availability, and datasets, according to their annotation, to highlight publicly-available and machine-actionable LRs. By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022, providing a comprehensive overview of the current state and challenges of Greek NLP research. We discuss the progress of Greek NLP and outline encountered Greek LRs, classified by availability and usability. As we show, our proposed method helps avoid common pitfalls, such as data leakage and contamination, and to assess language support per NLP task. We consider this systematic literature review of Greek NLP an application of our method that showcases the benefits of a monolingual NLP survey. Similar applications could be regard the myriads of languages whose progress in NLP lags behind that of well-supported languages.

author availability ann, modern greek, nlp application, (16 more...)

arXiv.org Artificial Intelligence

2407.09861

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > Slovenia (0.04)
(23 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology (1.00)
Education > Curriculum > Subject-Specific Education (0.92)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(8 more...)

Add feedback

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data

Paraskevopoulos, Georgios, Tsoukala, Chara, Katsamanis, Athanasios, Katsouros, Vassilis

arXiv.org Artificial IntelligenceJun-21-2024

The development of speech technologies for languages with limited digital representation poses significant challenges, primarily due to the scarcity of available data. This issue is exacerbated in the era of large, data-intensive models. Recent research has underscored the potential of leveraging weak supervision to augment the pool of available data. In this study, we compile an 800-hour corpus of Modern Greek from podcasts and employ Whisper large-v3 to generate silver transcriptions. This corpus is utilized to fine-tune our models, aiming to assess the efficacy of this approach in enhancing ASR performance. Our analysis spans 16 distinct podcast domains, alongside evaluations on established datasets for Modern Greek. The findings indicate consistent WER improvements, correlating with increases in both data volume and model size. Our study confirms that assembling large, weakly supervised corpora serves as a cost-effective strategy for advancing speech technologies in under-resourced languages.

corpus, evaluation, speech, (16 more...)

arXiv.org Artificial Intelligence

2406.15284

Country:

Europe > Greece > Attica > Athens (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Mobile (0.96)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

OYXOY: A Modern NLP Test Suite for Modern Greek

Kogkalidis, Konstantinos, Chatzikyriakidis, Stergios, Giannikouri, Eirini Chrysovalantou, Katsouli, Vassiliki, Klironomou, Christina, Koula, Christina, Papadakis, Dimitris, Pasparaki, Thelka, Psaltaki, Erofili, Sakellariou, Efthymia, Soupiona, Hara

arXiv.org Artificial IntelligenceJan-26-2024

This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just \textit{one}, but rather \textit{all} possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.

computational linguistic, dataset, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2309.07009

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > Dominican Republic (0.04)
Europe > Spain (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

GRDD: A Dataset for Greek Dialectal NLP

Chatzikyriakidis, Stergios, Qwaider, Chatrine, Kolokousis, Ilias, Koula, Christina, Papadakis, Dimitris, Sakellariou, Efthymia

arXiv.org Artificial IntelligenceNov-25-2023

In this paper, we present a dataset for the computational study of a number of Modern Greek dialects. It consists of raw text data from four dialects of Modern Greek, Cretan, Pontic, Northern Greek and Cypriot Greek. The dataset is of considerable size, albeit imbalanced, and presents the first attempt to create large scale dialectal resources of this type for Modern Greek dialects. We then use the dataset to perform dialect idefntification. We experiment with traditional ML algorithms, as well as simple DL architectures. The results show very good performance on the task, potentially revealing that the dialects in question have distinct enough characteristics allowing even simple ML models to perform well on the task. Error analysis is performed for the top performing algorithms showing that in a number of cases the errors are due to insufficient dataset cleaning.

algorithm, dataset, dialect, (14 more...)

arXiv.org Artificial Intelligence

2308.00802

Country:

Europe > Greece > West Macedonia > Kozani (0.05)
Europe > Middle East > Cyprus (0.05)
Europe > Germany > Saxony > Leipzig (0.05)
(9 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.94)

Add feedback

Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek

Paraskevopoulos, Georgios, Kouzelis, Theodoros, Rouvalis, Georgios, Katsamanis, Athanasios, Katsouros, Vassilis, Potamianos, Alexandros

arXiv.org Artificial IntelligenceDec-31-2022

Modern speech recognition systems exhibits rapid performance degradation under domain shift. This issue is especially prevalent in data-scarce settings, such as low-resource languages, where diversity of training data is limited. In this work we propose M2DS2, a simple and sample-efficient finetuning strategy for large pretrained speech models, based on mixed source and target domain self-supervision. We find that including source domain self-supervision stabilizes training and avoids mode collapse of the latent representations. For evaluation, we collect HParl, a $120$ hour speech corpus for Greek, consisting of plenary sessions in the Greek Parliament. We merge HParl with two popular Greek corpora to create GREC-MD, a test-bed for multi-domain evaluation of Greek ASR systems. In our experiments we find that, while other Unsupervised Domain Adaptation baselines fail in this resource-constrained environment, M2DS2 yields significant improvements for cross-domain adaptation, even when a only a few hours of in-domain audio are available. When we relax the problem in a weakly supervised setting, we find that independent adaptation for audio using M2DS2 and language using simple LM augmentation techniques is particularly effective, yielding word error rates comparable to the fully supervised baselines.

artificial intelligence, sample-efficient unsupervised domain adaptation, speech recognition, (3 more...)

arXiv.org Artificial Intelligence

2301.00304

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback